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Abstract. Causal mediation analysis is routinely conducted by applied 
researchers in a variety of disciplines. The goal of such an analysis is 
to investigate alternative causal mechanisms by examining the roles of 
intermediate variables that lie in the causal paths between the treat- 
ment and outcome variables. In this paper we first prove that under 
a particular version of sequential ignorability assumption, the aver- 
age causal mediation effect (ACME) is nonparametrically identified. 
We compare our identification assumption with those proposed in the 
literature. Some practical implications of our identification result are 
also discussed. In particular, the popular estimator based on the linear 
structural equation model (LSEM) can be interpreted as an ACME 
estimator once additional parametric assumptions are made. We show 
that these assumptions can easily be relaxed within and outside of the 
LSEM framework and propose simple nonparametric estimation strate- 
gies. Second, and perhaps most importantly, we propose a new sensi- 
tivity analysis that can be easily implemented by applied researchers 
within the LSEM framework. Like the existing identifying assumptions, 
the proposed sequential ignorability assumption may be too strong in 
many applied settings. Thus, sensitivity analysis is essential in order to 
examine the robustness of empirical findings to the possible existence 
of an unmeasured confounder. Finally, we apply the proposed methods 
to a randomized experiment from political psychology. We also make 
easy-to-use software available to implement the proposed methods. 

Key words and phrases: Causal inference, causal mediation analysis, 
direct and indirect effects, linear structural equation models, sequential 
ignorability, unmeasured confounders. 



1. INTRODUCTION 

Causal mediation analysis is routinely conducted 
by applied researchers in a variety of scientific disci- 
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plines including epidemiology, political science, psy- 
chology and sociology (see MacKinnon, 2008). The 
goal of such an analysis is to investigate causal mech- 
anisms by examining the role of intermediate vari- 
ables thought to lie in the causal path between the 
treatment and outcome variables. Over fifty years 
ago, Cochran (1957) pointed to both the possibility 
and difficulty of using covariance analysis to explore 
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causal mechanisms by stating: "Sometimes these av- 
erages have no physical or biological meaning of in- 
terest to the investigator, and sometimes they do not 
have the meaning that is ascribed to them at first 
glance" (page 267). Recently, a number of statisti- 
cians have taken up Cochran's challenge. Robins and 
Greenland (1992) initiated a formal study of causal 
mediation analysis, and a number of articles have 
appeared in more recent years (e.g., Pearl, 2001; 
Robins, 2003; Rubin, 2004; Petersen, Sinisi and van 
der Laan, 2006; Geneletti, 2007; Joffe, Small and 
Hsu, 2007; Ten Have et al., 2007; Albert, 2008; Jo, 
2008; Joffe et al, 2008; Sobel, 2008; VanderWeele, 
2008, 2009; Glynn, 2010). 

What do we mean by a causal mechanism? The 
aforementioned paper by Cochran gives the follow- 
ing example. In a randomized experiment, researchers 
study the causal effects of various soil fumigants on 
eelworms that attack farm crops. They observe that 
these soil fumigants increase oats yields but wish to 
know whether the reduction of eelworms represents 
an intermediate phenomenon that mediates this ef- 
fect. In fact, many scientists across various disci- 
plines are not only interested in causal effects but 
also in causal mechanisms because competing scien- 
tific theories often imply that different causal paths 
underlie the same cause-effect relationship. 

In this paper we contribute to this fast-growing lit- 
erature in several ways. After briefly describing our 
motivating example in the next section, we prove in 
Section 3 that under a particular version of the se- 
quential ignorability assumption, the average causal 
mediation effect (ACME) is nonparametrically iden- 
tified. We compare our identifying assumption with 
those proposed in the literature, and discuss practi- 
cal implications of our identification result. In par- 
ticular, Baron and Kenny's (1986) popular estima- 
tor (Google Scholar records over 17 thousand cita- 
tions for this paper), which is based on a linear struc- 
tural equation model (LSEM), can be interpreted 
as an ACME estimator under the proposed assump- 
tion if additional parametric assumptions are satis- 
fied. We show that these additional assumptions can 
be easily relaxed within and outside of the LSEM 
framework. In particular, we propose a simple non- 
parametric estimation strategy in Section 4. We con- 
duct a Monte Carlo experiment to investigate the 
finite-sample performance of the proposed nonpara- 
metric estimator and its asymptotic confidence in- 
terval. 



Like many identification assumptions, the proposed 
assumption may be too strong for the typical sit- 
uations in which causal mediation analysis is em- 
ployed. For example, in experiments where the treat- 
ment is randomized but the mediator is not, the ig- 
norability of the treatment assignment holds but the 
ignorability of the mediator may not. In Section 5 we 
propose a new sensitivity analysis that can be imple- 
mented by applied researchers within the standard 
LSEM framework. This method directly evaluates 
the robustness of empirical findings to the possi- 
ble existence of unmeasured pre-treatment variables 
that confound the relationship between the media- 
tor and the outcome. Given the fact that the se- 
quential ignorability assumption cannot be directly 
tested even in randomized experiments, sensitivity 
analysis must play an essential role in causal media- 
tion analysis. Finally, in Section 6 we apply the pro- 
posed methods to the empirical example, to which 
we now turn. 

2. AN EXAMPLE FROM THE SOCIAL 
SCIENCES 

Since the influential article by Baron and Kenny 
(1986), mediation analysis has been frequently used 
in the social sciences and psychology in particu- 
lar. A central goal of these disciplines is to iden- 
tify causal mechanisms underlying human behavior 
and opinion formation. In a typical psychological ex- 
periment, researchers randomly administer certain 
stimuli to subjects and compare treatment group be- 
havior or opinions with those in the control group. 
However, to directly test psychological theories, es- 
timating the causal effects of the stimuli is typically 
not sufficient. Instead, researchers choose to inves- 
tigate psychological factors such as cognition and 
emotion that mediate causal effects in order to ex- 
plain why individuals respond to a certain stimulus 
in a particular way. Another difficulty faced by many 
researchers is their inability to directly manipulate 
psychological constructs. It is in this context that 
causal mediation analysis plays an essential role in 
social science research. 

In Section 6 we apply our methods to an influen- 
tial randomized experiment from political psychol- 
ogy. Nelson, Clawson and Oxley (1997) examine how 
the framing of political issues by the news media af- 
fects citizens' political opinions. While the authors 
are not the first to use causal mediation analysis in 
political science, their study is one of the most well- 
known examples in political psychology and also 
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represents a typical application of causal mediation 
analyses in the social sciences. Media framing is the 
process by which news organizations define a po- 
litical issue or emphasize particular aspects of that 
issue. The authors hypothesize that differing frames 
for the same news story alter citizens' political tol- 
erance by affecting more general political attitudes. 
They conducted a randomized experiment to test 
this mediation hypothesis. 

Specifically, Nelson, Clawson and Oxley (1997) 
used two different local newscasts about a Ku Klux 
Klan rally held in central Ohio. In the experiment, 
student subjects were randomly assigned to watch 
the nightly news from two different local news chan- 
nels. The two news clips were identical except for 
the final story on the Klan rally. In one newscast, 
the Klan rally was presented free speech issue. 
In the second newscast, the journalists presented 
the Klan rally as a disruption of public order that 
threatened to turn violent. The outcome was mea- 
sured using two different scales of political toler- 
ance. Immediately after viewing the news broadcast, 
subjects were asked two seven-point scale questions 
measuring their tolerance for the Klan speeches and 
rallies. The hypothesis was that the causal effects of 
the media frame on tolerance are mediated by sub- 
jects' attitudes about the importance of free speech 
and the maintenance of public order. In other words, 
the media frame influences subjects' attitudes to- 
ward the Ku Klux Klan by encouraging them to 
consider the Klan rally as an event relevant for the 
general issue of free speech or public order. The 
researchers used additional survey questions and a 
scaling method to measure these hypothesized me- 
diating factors after the experiment was conducted. 



Table 1 reports descriptive statistics for these me- 
diator variables as well as the treatment and out- 
come variables. The sample size is 136, with 67 sub- 
jects exposed to the free speech frame and 69 sub- 
jects assigned to the public order frame. As is clear 
from the last column, the media frame treatment 
appears to influence both types of response vari- 
ables in the expected directions. For example, be- 
ing exposed to the public order frame as opposed to 
the free speech frame significantly increased the sub- 
jects' perceived importance of public order, while de- 
creasing the importance of free speech (although the 
latter effect is not statistically significant). More- 
over, the public order treatment decreased the sub- 
jects' tolerance toward the Ku Klux Klan speech in 
the news clips compared to the free speech frame. 

It is important to note that the researchers in 
this example are primarily interested in the causal 
mechanism between media framing and political tol- 
erance rather than various causal effects given in 
the last column of Table 1. Indeed, in many so- 
cial science experiments, researchers' interest lies in 
the identification of causal mediation effects rather 
than the total causal effect or controlled direct ef- 
fects (these terms are formally defined in the next 
section). Causal mediation analysis is particularly 
appealing in such situations. 

One crucial limitation of this study, however, is 
that like many other psychological experiments the 
original researchers were only able to randomize news 
stories but not subjects' attitudes. This implies that 
there is likely to be unobserved covariates that con- 
found the relationship between the mediator and the 
outcome. As we formally show in the next section, 
the existence of such confounders represents a vio- 
lation of a key assumption for identifying the causal 



Table 1 

Descriptive statistics and estimated average treatment effects from the media framing experiment. The middle four columns 
show the means and standard deviations of the mediator and outcome variables for each treatment group. The last column 

reports the estimated average causal effects of the public order frame as opposed to the free speech frame on the three 
response variables along with their standard errors. The estimates suggest that the treatment affected each of these variables 

in the expected directions 
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Importance of free speech 
Importance of public order 
Tolerance for the KKK 
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5.25 
5.43 
2.59 



1.43 
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4.75 
3.13 
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-0.231 (0.239) 
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mechanism. For example, it is possible that subjects' 
underlying political ideology affects both their pub- 
lic order attitude and their tolerance for the Klan 
rally within each treatment condition. This scenario 
is of particular concern since it is well established 
that politically conservative citizens tend to be more 
concerned about public order issues and also, in 
some instances, be more sympathetic to groups like 
the Klan. In Section 5 we propose a new sensitivity 
analysis that partially addresses such concerns. 

3. IDENTIFICATION 

In this section we propose a new nonparametric 
identification assumption for the ACME and discuss 
its practical implications. We also compare the pro- 
posed assumption with those available in the litera- 
ture. 

3.1 The Framework 

Consider a simple random sample of size n from a 
population where for each unit i we observe (Tj, Mj, 
Xi,Yi). We use Tj to denote the binary treatment 
variable where Tj = 1 (Tj = 0) implies unit i re- 
ceives (does not receive) the treatment. The mediat- 
ing variable of interest, that is, the mediator, is rep- 
resented by Mi, whereas Yi represents the outcome 
variable. Finally, Xi denotes the vector of observed 
pre-treatment covariates, and we use A4, X and y 
to denote the support of the distributions of Mi , Xi 
and Yi, respectively. 

What qualifies as a mediator? Since the media- 
tor lies in the causal path between the treatment 
and the outcome, it must be a post-treatment vari- 
able that occurs before the outcome is realized. Be- 
yond this minimal requirement, what constitutes a 
mediator is determined solely by the scientific the- 
ory under investigation. Consider the following ex- 
ample, which is motivated by a referee's comment. 
Suppose that the treatment is parents' decision to 
have their child receive the live vaccine for H1N1 flu 
virus and the outcome is whether the child develops 
flu or not. For a virologist, a mediator of interest 
may be the development of antibodies to H1N1 live 
vaccine. But, if parents sign a form acknowledging 
the risks of the vaccine, can this act of form signing 
also be a mediator? Indeed, social scientists (if not 
virologists!) may hypothesize that being informed of 
the risks will make parents less likely to have their 
child receive the second dose of the vaccine, thereby 
increasing the risk of developing flu. This example 



highlights the important role of scientific theories in 
causal mediation analysis. 

To define the causal mediation effects, we use the 
potential outcomes framework. Let Mi (t) denote the 
potential value of the mediator for unit i under the 
treatment status Ti = t. Similarly, we use Yi(t,m) 
to represent the potential outcome for unit i when 
Ti = t and Mj = m. Then, the observed variables can 
be written as Mi = Mi(Ti) and Yi = Yi(Ti, Mi(Ti)) . 
Similarly, if the mediator takes J different values, 
there exist 2 J potential values of the outcome vari- 
able, only one of which can be observed. 

Using the potential outcomes notation, we can 
define the causal mediation effect for unit i under 
treatment status t as (see Robins and Greenland, 
1992; Pearl, 2001) 

(1) 5i(t) = Y(t,Mi(l)) - Yi(t, Mi(0)) 

for t = 0, 1. Pearl (2001) called 5i(t) the natural in- 
direct effect, while Robins (2003) used the term the 
pure indirect effect for <5j(0) and the total indirect 
effect for <5j(l). In words, 5i(t) represents the dif- 
ference between the potential outcome that would 
result under treatment status t, and the potential 
outcome that would occur if the treatment status is 
the same and yet the mediator takes a value that 
would result under the other treatment status. Note 
that the former is observable (if the treatment vari- 
able is actually equal to t), whereas the latter is by 
definition unobservable [under the treatment status 
t we never observe Mj(l — t)]. Some feel uncom- 
fortable with the idea of making inferences about 
quantities that can never be observed (e.g., Rubin, 
2005, page 325), while others emphasize their impor- 
tance in policy making and scientific research (Pearl, 
2001, Section 2.4, 2010, Section 6.1.4; Hafeman and 
Schwartz 2009). 

Furthermore, the above notation implicitly assumes 
that the potential outcome depends only on the val- 
ues of the treatment and mediating variables and, in 
particular, not on how they are realized. For exam- 
ple, this assumption would be violated if the out- 
come variable responded to the value of the me- 
diator differently depending on whether it was di- 
rectly assigned or occurred as a natural response to 
the treatment, that is, for t = 0, 1 and all m € M, 
Yi(t,Mi(t)) = Yi(t,Mi(l - t)) = Yi(t,m) if M 4 (l) = 
Mj(0) =m. 

Thus, equation (1) formalizes the idea that the 
mediation effects represent the indirect effects of the 
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treatment through the mediator. In this paper we fo- 
cus on the identification and inference of the average 
causal mediation effect (ACME) , which is defined as 

5(t) =E(5i(t)) 

(2) 

= E{Y i (t,Mi(l))-Y i (t,M i (p))} 

for 4 = 0,1. In the potential outcomes framework, 
the causal effect of the treatment on the outcome for 
unit i is defined as n = y(l,Mj(l)) - Yj(0,Mj(0)), 
which is typically called the total causal effect. There- 
fore, the causal mediation effect and the total causal 
effect have the following relationship: 

(3) n = 5i(t)+Ci(l-t), 

where &(t) = Fj(l,Mj(t)) - Yj(0,Mj(4)) for 4 = 0,1. 
This quantity Q(t) 1S called the natural direct ef- 
fect by Pearl (2001) and the pure/total direct effect 
by Robins (2003). This represents the causal effect 
of the treatment on the outcome when the media- 
tor is set to the potential value that would occur 
under treatment status t. In other words, Q(t) is 
the direct effect of the treatment when the mediator 
is held constant. Equation (3) shows an important 
relationship where the total causal effect is equal 
to the sum of the mediation effect under one treat- 
ment condition and the natural direct effect under 
the other treatment condition. Clearly, this equality 
also holds for the average total causal effect so that 
f = E{Yi (1, Mi(l)) - y t (0,Mi(0))} = S(t) + C(l - 4) 
for t = 0,1 where £(t) = E(£i(t)). 

The causal mediation effects and natural direct 
effects differ from the controlled direct effect of the 
mediator, that is, lj (4, m) — Yj (4, ml) for 4 = 0,1 and 
m ^ m', and that of the treatment, that is, Yj(l, m) — 
Yi(0,m) for all m € M (Pearl, 2001; Robins, 2003). 
Unlike the mediation effects, the controlled direct 
effects of the mediator are defined in terms of spe- 
cific values of the mediator, m and m' , rather than 
its potential values, Mj(l) and Mj(0). While causal 
mediation analysis is used to identify possible causal 
paths from Tj to Yj , the controlled direct effects may 
be of interest, for example, if one wishes to under- 
stand how the causal effect of Mj on Yj changes as 
a function of Tj. In other words, the former exam- 
ines whether Mj mediates the causal relationship 
between Tj and Yj, whereas the latter investigates 
whether Tj moderates the causal effect of Mj on Yj 
(Baron and Kenny, 1986). 



3.2 The Main Identification Result 

We now present our main identification result us- 
ing the potential outcomes framework described above 
We show that under a particular version of sequen- 
tial ignorability assumption, the ACME is nonpara- 
metrically identified. We first define our identifying 
assumption: 

Assumption 1 (Sequential ignorability). 

(4) {Y i (t',m),M i {t)}±T i \X i =x, 

(5) Y i (t',m)±M i (t)\T i = t,X i = x 

for 4,4' = 0,1, and all x G X where it is also as- 
sumed that < Pr(Tj = t\X { = x) and < p(Mj(4) = 
m|Tj = 4, Xi = x) for 4 = 0, 1, and all x £ X and 
m G M. 

Thus, the treatment is first assumed to be ignor- 
able given the pre-treatment covariates, and then 
the mediator variable is assumed to be ignorable 
given the observed value of the treatment as well 
as the pre-treatment covariates. We emphasize that, 
unlike the standard sequential ignorability assump- 
tion in the literature (e.g., Robins, 1999), the con- 
ditional independence given in equation (5) of As- 
sumption 1 must hold without conditioning on the 
observed values of post-treatment confounders. This 
issue is discussed further below. 

The following theorem presents our main identi- 
fication result, showing that under this assumption 
the ACME is nonparametrically identified. 

Theorem 1 (Nonparametric identification). Un- 
der Assumption 1, the ACME and the average natu- 
ral direct effects are nonparametrically identified as 
follows for 4 = 0, 1: 

S(t) = J j'K(Y i \M l =m,T l = t,X i = x) 

{dF Mi \ T . =ltXi = x (m) 

- dF M .\ T . = ^ x . =x (m)} dF Xi (x), 

C(4) = y j{E(Y l \M l = m,T i = l,X l =x) 

-E(Y i \M i =m,T i = 0,X i = x)} 
dF M .\ T . =t!X . =x (m) dF Xt {x), 

where Tz(-) and Fz\w{') represent the distribution 
function of a random variable Z and the conditional 
distribution function of Z given W . 
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A proof is given in Appendix A. Theorem 1 is 
quite general and can be easily extended to any 
types of treatment regimes, for example, a contin- 
uous treatment variable. In fact, the proof requires 
no change except letting t and t' take values other 
than and 1. Assumption 1 can also be somewhat 
relaxed by replacing equation (5) with its corre- 
sponding mean independence assumption. However, 
as mentioned above, this identification result does 
not hold under the standard sequential ignorabil- 
ity assumption. As shown by Avin, Shpitser and 
Pearl (2005) and also pointed out by Robins (2003), 
the nonparametric identification of natural direct 
and indirect effects is not possible without an ad- 
ditional assumption if equation (5) holds only af- 
ter conditioning on the post-treatment confounders 
Zi as well as the pre-treatment covariates Xi, that 
is, Yi(t' ,m) ± Mi(t)\Ti = t, Z { = z,Xi = x, for t,t' = 
0, 1, and all x € X and z £ Z where Z is the support 
of Zi . This is an important limitation since assuming 
the absence of post-treatment confounders may not 
be credible in many applied settings. In some cases, 
however, it is possible to address the main source of 
confounding by conditioning on pre-treatment vari- 
ables alone (see Section 6 for an example). 

3.3 Comparison with the Existing Results 
in the Literature 

Next, we compare Theorem 1 with the related 
identification results in the literature. First, Pearl 
(2001, Theorem 2) makes the following set of as- 
sumptions in order to identify 6(t*): 

p(Y(t,m)\Xi = x) and 

(6) 

p(Mi(t*)\Xi = x) are identifiable, 

(7) Y i (t,m)±M i (t*)\X i = x 

for all t = 0,1, m G A4, and x £ X. Under these as- 
sumptions, Pearl arrives at the same expressions for 
the ACME as the ones given in Theorem 1. Indeed, 
it can be shown that Assumption 1 implies equa- 
tions (6) and (7). While the converse is not necessar- 
ily true, in practice, the difference is only technical 
(see, e.g., Robins, 2003, page 76). For example, con- 
sider a typical situation where the treatment is ran- 
domized given the observed pre-treatment covari- 
ates Xi and researchers are interested in identifying 
both 6(1) and 5(0). In this case, it can be shown that 
Assumption 1 is equivalent to Pearl's assumptions. 

Moreover, one practical advantage of equation (5) 
of Assumption 1 is that it is easier to interpret than 



equation (7), which represents the independence be- 
tween the potential values of the outcome and the 
potential values of the mediator. Pearl himself rec- 
ognizes this difficulty, and states "assumptions of 
counterfactual independencies can be meaningfully 
substantiated only when cast in structural form" 
(page 416). In contrast, equation (5) simply means 
that Mi is effectively randomly assigned given Tj 
and Xi. 

Second, Robins (2003) considers the identification 
under what he calls a FRCISTG model, which sat- 
isfies equation (4) as well as 

(8) Yi(t,m) ±Mi(t)\Ti = t,Zi = z,Xi = x 

for t = 0, 1 where is a vector of the observed values 
of post-treatment variables that confound the rela- 
tionship between the mediator and outcome. The 
key difference between Assumption 1 and a FR- 
CISTG model is that the latter allows conditioning 
on Zi while the former does not. Robins (2003) ar- 
gued that this is an important practical advantage 
over Pearl's conditions, in that it makes the ignora- 
bility of the mediator more credible. In fact, not al- 
lowing for conditioning on observed post-treatment 
confounders is an important limitation of Assump- 
tion 1. 

Under this model, Robins (2003, Theorem 2.1) 
shows that the following additional assumption is 
sufficient to identify the ACME: 

(9) Yi(l,m)-Y i (0,m) = B l , 

where Bi is a random variable independent of m. 
This assumption, called the no-interaction assump- 
tion, states that the controlled direct effect of the 
treatment does not depend on the value of the medi- 
ator. In practice, this assumption can be violated in 
many applications and has sometimes been regarded 
as "very restrictive and unrealistic" (Petersen, Sinisi 
and van der Laan, 2006, page 280). In contrast, The- 
orem 1 shows that under the sequential ignorabil- 
ity assumption that does not condition on the post- 
treatment covariates, the no- interaction assumption 
is not required for the nonparametric identification. 
Therefore, there exists an important trade-off; al- 
lowing for conditioning on observed post-treatment 
confounders requires an additional assumption for 
the identification of the ACME. 

Third, Petersen, Sinisi and van der Laan (2006) 
present yet another set of identifying assumptions. 
In particular, they maintain equation (5) but replace 
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equation (4) with the following slightly weaker con- 
dition: 

Yi(t,m) -LTAXi = x and 

(10) , s , 

Mi(t) ±Ti\Xi = x 

for t = 0, 1 and all m £ M. In practice, this differ- 
ence is only a technical matter because, for exam- 
ple, in randomized experiments where the treatment 
is randomized, equations (4) and (10) are equiva- 
lent. However, this slight weakening of equation (4) 
comes at a cost, requiring an additional assump- 
tion for the identification of the ACME. Specifically, 
Petersen, Sinisi and van der Laan (2006) assume 
that the magnitude of the average direct effect does 
not depend on the potential values of the media- 
tor, that is, E{Yi(l,m)-Y i (0,m)\Mi(t*) = m,Xi = 
x} = E{Yi(l,m) -Yi(0,m)\Xi = x} for all meM. 
Theorem 1 shows that if equation (10) is replaced 
with equation (4), which is possible when the treat- 
ment is randomized, then this additional assumption 
is unnecessary for the nonparametric identification. 
In addition, this additional assumption is somewhat 
difficult to interpret in practice because it entails the 
mean independence relationship between the poten- 
tial values of the outcome and the potential values 
of the mediator. 

Fourth, in the appendix of a recent paper, Hafe- 
man and VanderWeele (2010) show that if the me- 
diator is binary, the ACME can be identified with a 
weaker set of assumptions than Assumption 1. How- 
ever, it is unclear whether this result can be gener- 
alized to cases where the mediator is nonbinary. In 
contrast, the identification result given in Theorem 1 
holds for any type of mediator, whether discrete or 
continuous. Both identification results hold for gen- 
eral treatment regimes, unlike some of the previous 
results. 

Finally, Rubin (2004) suggests an alternative ap- 
proach to causal mediation analysis, which has been 
adopted recently by other scholars (e.g., Egleston et 
al., 2006; Gallop et al, 2009; Elliott, Raghunathan 
and Li, 2010). In this framework, the average direct 
effect of the treatment is given by E(l^(l, Mj(l)) — 
yi(0,Mi(0))|Mi(l) =Mi(0)), representing the aver- 
age treatment effect among those whose mediator 
is not affected by the treatment. Unlike the aver- 
age direct effect introduced above, this quan- 
tity is defined for a principal stratum, which is a 
latent subpopulation. Within this framework, there 
exists no obvious definition for the mediation ef- 
fect unless the direct effect is zero (in this case, the 



treatment affects the outcome only through the me- 
diator). Although some estimate E(Y^(1, Mj(l)) — 
Yi(0,Mi(0))\Mi(l)^Mi(0)) and compare it with the 
above average direct effect, as VanderWeele (2008) 
points out, the problem of such comparison is that 
two quantities are defined for different subsets of 
the population. Another difficulty of this approach 
is that when the mediator is continuous the popula- 
tion proportion of those with Mj(l) = Mj(0) can be 
essentially zero. This explains why the application 
of this approach has been limited to the studies with 
a discrete (often binary) mediator. 

3.4 Implications for Linear Structural 
Equation Model 

Next, we discuss the implications of Theorem 1 
for LSEM, which is a popular tool among applied 
researchers who conduct causal mediation analysis. 
In an influential article, Baron and Kenny (1986) 
proposed a framework for mediation analysis, which 
has been used by many social science methodolo- 
gists; see MacKinnon (2008) for a review and Imai, 
Keele and Tingley (2009) for a critique of this lit- 
erature. This framework is based on the following 
system of linear equations: 

(11) Y i = a 1 + p 1 T i + e a , 

(12) M i = a 2 + foTi + e i2 , 

(13) Y i = a 3 + P 3 T i + 1 M i + £ i3 . 

Although we adhere to their original model, one may 
further condition on any observed pre-treatment co- 
variates by including them as additional regressors 
in each equation. This will change none of the re- 
sults given below so long as the model includes no 
post-treatment confounders. 

Under this model, Baron and Kenny (1986) sug- 
gested that the existence of mediation effects can 
be tested by separately fitting the three linear re- 
gressions and testing the null hypotheses (1) j3i = 0, 
(2) /3 2 = 0, and (3) 7 = 0. If all of these null hy- 
potheses are rejected, they argued, then /?27 could 
be interpreted as the mediation effect. We note that 
equation (11) is redundant given equations (12) and 
(13). To see this, substitute equation (12) into equa- 
tion (13) to obtain 

Yi = (a 3 + a 2 7) + (/% + MTi 

(14) 

+ (7£i2 +£i3>- 

Thus, testing Pi = is unnecessary since the ACME 
can be nonzero even when the average total causal 
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effect is zero. This happens when the mediation ef- 
fect offsets the direct effect of the treatment. 

The next theorem proves that within the LSEM 
framework, Baron and Kenny's interpretation is valid 
if Assumption 1 holds. 

Theorem 2 (Identification under the LSEM). 

Consider the LSEM defined in equations (11), (12) 
and (13). Under Assumption 1, the ACME is identi- 
fied and given by 6(0) = 5(1) = /?27, where the equal- 
ity between 5(0) and 5(1) is also assumed. 

A proof is in Appendix B. The theorem implies 
that under the same set of assumptions, the aver- 
age natural direct effects are identified as C(0) = 
£(1) = (3 3 , where the average total causal effect is 
f = /?3 + /?27- Thus, Assumption 1 enables the iden- 
tification of the ACME under the LSEM. Egleston 
et al. (2006) obtain a similar result under the as- 
sumptions of Pearl (2001) and Robins (2003), which 
were reviewed in Section 3.3. 

It is important to note that under Assumption 1, 
the standard LSEM defined in equations (12) and 
(13) makes the following no-interaction assumption 
about the ACME: 

Assumption 2 (No-interaction between the Treat- 
ment and the ACME). 

5(1) =5(0). 

This assumption is equivalent to the no- interaction 
assumption for the average natural direct effects, 
C(l) = ((0). Although Assumption 2 is related to 
and implied by Robins' no-interaction assumption 
given in equation (9), the key difference is that As- 
sumption 2 is written in terms of the ACME rather 
than controlled direct effects. 

As Theorem 1 suggests, Assumption 2 is not re- 
quired for the identification of the ACME under the 
LSEM. We extend the outcome model given in equa- 
tion (13) to 

(15) Y i = a 3 + /3 3 T i + jM i + KT i M i + £ i3 , 

where the interaction term between the treatment 
and mediating variables is added to the outcome 
regression while maintaining the linearity in param- 
eters. This formulation was first suggested by Judd 
and Kenny (1981) and more recently advocated by 
Kraemer et al. (2008, 2002) as an alternative to Bar- 
ron and Kenny's approach. Under Assumption 1 and 
the model defined by equations (12) and (15), we can 
identify the ACME as 5(t) = (3 2 (l + tn) for t = 0, 1. 



The average natural direct effects are identified as 
C(t) = j3 3 + k(o2 + fot), and the average total causal 
effect is equal to f = ,$27 + (3 3 + n(a 2 + (32). This 
conflicts with the proposal by Kraemer et al. (2008) 
that the existence of mediation effects can be es- 
tablished by testing either 7 = or k = 0, which is 
clearly neither a necessary nor sufficient condition 
for 5(t) to be zero. 

The connection between the parametric and non- 
parametric identification becomes clearer when both 
Ti and Mj are binary. To see this, note that 5(t) can 
be equivalently expressed as [dropping the integra- 
tion over P(Xi) for notational simplicity] 
J-l 

5(t)=Y J HYi\M i =m,T i = t,X i ) 

m=0 

(16) -{Pr(Mi = m\T i = l,X i ) 

-Fr(M i = m\T l = 0,X i )}, 

when Mi is discrete. Furthermore, when J = 2, this 
reduces to 

5(t) = {Pv(M i = l\T l = l,X i ) 

-Pr(Af 4 = l|r i = 0,X 4 )} 

(17) 

■{E(Y l \M i = l,T i = t,X i ) 

-E(Y i \M i = 0,T i = t,X i )}. 

Thus, the ACME equals the product of two terms 
representing the average effect of Tj on Mi and that 
of Mi on Yi (holding Ti at t), respectively. 

Finally, in the existing methodological literature 
Sobel (2008) explores the identification problem of 
mediation effects under the framework of LSEM with- 
out assuming the ignorability of the mediator (see 
also Albert, 2008; Jo, 2008). However, Sobel (2008) 
maintains, among others, the assumption that the 
causal effect of the treatment is entirely through 
the mediator and applies the instrumental variables 
technique of Angrist, Imbens and Rubin (1996). That 
is, the natural direct effect is assumed to be zero for 
all units a priori, that is, Q(t) = for all t = 0,1 
and i. This assumption may be undesirable from 
the perspective of applied researchers, because the 
existence of the natural direct effect itself is often of 
interest in causal mediation analysis. See Joffe et al. 
(2008) for an interesting application. 

4. ESTIMATION AND INFERENCE 

In this section we use our nonparametric identifi- 
cation result above and propose simple parametric 
and nonparametric estimation strategies. 
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4.1 Parametric Estimation and Inference 

Under the LSEM given by equations (12) and (13) 
and Assumption 1, the estimation of the ACME is 
straightforward since the error terms are indepen- 
dent of each other. Thus, one can follow the pro- 
posal of Baron and Kenny (1986) and estimate equa- 
tions (12) and (13) by fitting two separate linear 
regressions. The standard error for the estimated 
ACME, that is, 5(t) = fill, can be calculated ei- 
ther approximately using the Delta method (Sobel, 
1982), that is, Var(<5(t)) « /3|Var( 7 ) + 7 2 Var(/3 2 ), 
or exactly via the variance formula of Goodman 
(1960), that is, Var(5(t)) = 0% Var( 7 ) + 7 2 Var(/3 2 ) + 
Var( 7 ) Var(/3 2 ). For the natural direct and total ef- 
fects, standard errors can be obtained via the re- 
gressions of Yi on Tj and Mj [equation (13)] and Y\ 
on Tj [equation (11)], respectively. 

When the model contains the interaction term as 
in equation (15) (so that Assumption 2 is relaxed), 
the asymptotic variance can be computed in a sim- 
ilar manner. For example, using the delta method, 
we have Vax(5(t)) « (7 + tK) 2 Var(/3 2 ) + /3|{Var( 7 ) + 
£Var(«) + 2iCov(7,£)} for t = 0,1. Similarly, 
Var(C(i)) » Var(/3 3 ) + (a 2 + t/3 2 ) 2 Var(R) + 2(a 2 + 
tfa ) Cov 0s , k) + k 2 { Var (d 2 ) + 1 Var 2 ) + 2t Cov (d 2 , 

/3 2 )}. For the average total causal effect, the variance 
can be obtained from the regression of Yi on Tj. 

4.2 Nonparametric Estimation and Inference 

Next, we consider a simple nonparametric esti- 
mator. Suppose that the mediator is discrete and 
takes J distinct values, that is, M. = {0, 1, . . . , J — 
1}. The case of continuous mediators is considered 
further below. First, we consider the cases where 
we estimate the ACME separately within each stra- 
tum defined by the pre-treatment covariates JQ. One 
may then aggregate the resulting stratum-specific 
estimates to obtain the estimated ACME. In such 
situations, a nonparametric estimator can be ob- 
tained by plugging in sample analogues for the pop- 
ulation quantities in the expression given in Theo- 
rem 1, 

? m = V / T,j=i Y i 1 {Ti=t,M i = m} 
{) ^ =i m = t,Mi = m} 



(18) 



l,Mi=m} 



n 

--^1{T, = 0, 
no *H 



M,, 




where nt = Y17=i = ^} anc ^ i = 0, 1. By the law 
of large numbers, this estimator asymptotically con- 
verges to the true ACME under Assumption 1. The 
next theorem derives the asymptotic variance of the 
nonparametric estimator defined in equation (18) 
given the realized values of the treatment variable. 

Theorem 3 (Asymptotic variance of the nonpara- 
metric estimator). Suppose that Assumption 1 holds. 
Then, the variance of the nonparametric estimator 
defined in equation (18) is asymptotically approxi- 
mated by 



Var(tf(t)) 



1 



J-i 

E 

m=0 



V\-t,r 



Vfm 



VBx(Yi\Mi = m,Ti = t) 



n t (l - vi-t,m)V 



tin 



ni-t 



J-i J-i 

E E 

m'=m+l m=0 



ni-t 



v l-t,m v l-t,m' a trn n tm' 



+ -Var(y i |T i = t) 
n t 

for t = 0, 1 where v tm = Pr(M, =m\Ti = t) and u trn = 
^{Y i \M i = m,T i = t). 

A proof is based on a tedious but simple appli- 
cation of the Delta method and thus is omitted. 
This asymptotic variance can be consistently esti- 
mated by replacing unknown population quantities 
with their corresponding sample counterparts. The 
estimated overall variance can be obtained by ag- 
gregating the estimated within-strata variances ac- 
cording to the sample size in each stratum. 

The second and perhaps more general strategy is 
to use nonparametric regressions to model u tm (x) = 
E(Yi\Ti =t,Mi = m, Xi = x) and u tm (x) = Pr(M; = 
m\Ti = t, Xi = x), and then employ the following es- 
timator: 



5(t) 



1 



n 



• n J-1 



(19) 



7 4 7 J fam{Xi 
i=l m=0 



(vi m (Xi) - £> Qm (Xi)) 



for t = 0, 1. This estimator is also asymptotically 
consistent for the ACME under Assumption 1 if 
fttm(x) and i>tm{x) are consistent for n tm (x) and 
v tm(x), respectively. Unfortunately, in general, there 
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is no simple expression for the asymptotic variance 
of this estimator. Thus, one may use a nonparamet- 
ric bootstrap [or a parametric bootstrap based on 
the asymptotic distribution of fttm{x) and z^ m (cc)] 
to compute uncertainty estimates. 

Finally, when the mediator is not discrete, we may 
nonparametrically model Htm(x) = E(li|Tj = t, Mj = 
m,Xi = x) and ijjt(x) = p(Mi\T{ = t, Xi = x). Then, 
one can use the following estimator: 

n K 

( 2 °) =^EE^w-^)ra}. 

i=l k=l 

(k) 

where fh ti is the fcth Monte Carlo draw of the me- 
diator Mi from its predicted distribution based on 
the fitted model ijjt(Xi). 

These estimation strategies are quite general in 
that they can be applied to a wide range of statisti- 
cal models. Imai, Keele and Tingley (2009) demon- 
strate the generality of these strategies by apply- 
ing them to common parametric and nonparamet- 
ric regression techniques often used by applied re- 
searchers. By doing so, they resolve some confusions 
held by social science methodologists, for example, 
how to estimate mediation effects when the out- 
come and/or the mediator is binary. Furthermore, 
the proposed general estimation strategies enable 
Imai et al. (2010) to develop an easy-to-use R pack- 
age, mediation, that implements these methods and 
demonstrate its use with an empirical example. 



4.3 A Simulation Study 

Next, we conduct a small-scale Monte Carlo ex- 
periment in order to investigate the finite-sample 
performance of the estimators defined in equations (18) 
and (19) as well as the proposed variance estima- 
tor given in Theorem 3. We use a population model 
where the potential outcomes and mediators are given 
by Yi(t,m) = exp(y i *(t,m)), M<(t) = l{M*{t) > 0.5} 
and Y* (t, m) , M* (t) are jointly normally distributed. 
The population parameters are set to the following 
values: E(Y*(1, 1)) = 2; E(Y*(l, 0)) = 0; E(Y*(0, 1)) = 
1; 1(17(0,0)) = 0.5; E(M*(1)) = 1; E(M*(0)) = 0; 
Var(y.*(t,m)) = Var(M*(t)) = 1 for t £ {0,1} and 
m G {0, 1}; Corr(y i * (t, m),Y* (t' , m')) = 0.5 for t, t' e 
{0,1} and m,m' G {0,1}; Corr(Y*(t,m),M*(t')) = 
for t e {0,1} and m £ {0,1}; and Corr(M*(l), 
M*(0)) = 0.3. 

Under this setup, Assumption 1 is satisfied. Thus, 
we can consistently estimate the ACME by applying 
the nonparametric estimator given in equation (18). 
Also, note that this data generating process implies 
the following parametric regression models for the 
observed data: 

(21) Pv(M i = l\T i ) = ^{a 2 + faT i ), 

Yi\Ti,Mi ~ lognormal(a 3 + M + jM { 

(22) 

+ KTiMi,oj), 

where (a 2 , fa, a 3 , /3 3 , 7, k, cj|) = (-0.5,1,0.5,-0.5, 
0.5,1.5,1) and $(•) is the standard normal distri- 
bution function. We can then obtain the parametric 



Table 2 

Finite-sample performance of the proposed estimators and their variance estimators. The table presents the results of a 
Monte Carlo experiment with varying sample sizes and fifty thousand iterations. The upper half of the table represents the 
results for S(0) and the bottom half 8(1). The columns represent (from left to right) the following: sample sizes, estimated 
biases, root mean squared errors (RMSE) and the coverage probabilities of the 95% confidence intervals of the nonparametric 

estimators, and the same set of quantities for the parametric estimators. The true values of 5(0) and S(l) are 0.675 and 
4.03, respectively. The results indicate that nonparametric estimators have smaller bias than the parametric estimator though 
its variance is much larger. The confidence intervals converge to the nominal coverage as the sample size increases. The 
convergence occurs much more quickly for the parametric estimator 



Nonparametric estimator Parametric estimator 





Sample size 


Bias 


RMSE 


95% CI coverage 


Bias 


RMSE 


95% CI coverage 


5(0) 


50 


0.002 


1.034 


0.824 


0.096 


0.965 


0.919 




100 


0.006 


0.683 


0.871 


0.044 


0.566 


0.933 




500 


-0.002 


0.292 


0.922 


0.006 


0.229 


0.947 


5(1) 


50 


0.010 


2.082 


0.886 


-0.010 


1.840 


0.934 




100 


0.005 


1.462 


0.912 


0.003 


1.290 


0.944 




500 


0.001 


0.643 


0.939 


0.001 


0.570 


0.955 
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maximum likelihood estimate of the ACME by fit- 
ting these two models via standard procedures and 
estimating the following expression based on Theo- 
rem 1 [see equation (17)]: 

5(t) = {exp(a 3 + p 3 t + 7 + at + of/2) 

(23) -ex V (a 3 +(3 3 t + a1/2)} 

•{$(a 2 + /3 2 )-$(a 2 )} 

for * = 0,1. 

We compare the performances of these two es- 
timators via Monte Carlo simulations. Specifically, 
we set the sample size n to 50, 100 and 500 where 
half of the sample receives the treatment and the 
other half is assigned to the control group, that is, 
n\ = no = n/2. Using equation (23), the true val- 
ues of the ACME are given by 5(0) = 0.675 and 
5(1) =4.03. 

Table 2 reports the results of the experiments based 
on fifty thousand iterations. The performance of the 
estimators turns out to be quite good in this partic- 
ular setting. Even with sample size as small as 50, 
estimated biases are essentially zero for the nonpara- 
metric estimates. The parametric estimators are 
slightly more biased for the small sample sizes, but 
they converge to the true values by the time the 
sample size reaches 500. As expected, the variance 
is larger for the nonparametric estimator than the 
parametric estimator. The 95% confidence intervals 
converge to the nominal coverage as the sample size 
increases. The convergence occurs much more quickly 
for the parametric estimator. (Although not reported 
in the table, we confirmed that for both estimators 
the coverage probabilities fully converged to their 
nominal values by the time the sample size reached 
5000.) 

5. SENSITIVITY ANALYSIS 

Although the ACME is nonparametrically identi- 
fied under Assumption 1, this assumption, like other 
existing identifying assumptions, may be too strong 
in many applied settings. Consider randomized ex- 
periments where the treatment is randomized but 
the mediator is not. Causal mediation analysis is 
most frequently applied to such experiments. In this 
case, equation (4) of Assumption 1 is satisfied but 
equation (5) may not hold for two reasons. First, 
there may exist unmeasured pre-treatment covari- 
ates that confound the relationship between the me- 
diator and the outcome. Second, there may exist ob- 
served or unobserved post-treatment confounders. 



These possibilities, along with other obstacles en- 
countered in applied research, have led some schol- 
ars to warn against the abuse of mediation analyses 
(e.g., Green, Ha and Bullock, 2010). Indeed, as we 
formally show below, the data generating process 
contains no information about the credibility of the 
sequential ignorability assumption. 

To address this problem, we develop a method to 
assess the sensitivity of an estimated ACME to un- 
measured pre-treatment confounding (The proposed 
sensitivity analysis, however, does not address the 
possible existence of post-treatment confounders). 
The method is based on the standard LSEM frame- 
work described in Section 3.4 and can be easily used 
by applied researchers to examine the robustness of 
their empirical findings. We derive the maximum 
departure from equation (5) that is allowed while 
maintaining their original conclusion about the di- 
rection of the ACME (see Imai and Yamamoto, 2010) 
For notational simplicity, we do not explicitly con- 
dition on the pre-treatment covariates Xj. However, 
the same analysis can be conducted by including 
them as additional covariates in each regression. 

5.1 Parametric Sensitivity Analysis Based on the 
Residual Correlation 

The proof of Theorem 2 implies that if equation (4) 
holds, Ei2 JL Ti and Ei 3 X T hold but £j 2 JL Si 3 does 
not unless equation (5) also holds. Thus, one way 
to assess the sensitivity of one's conclusions to the 
violation of equation (5) is to use the following sen- 
sitivity parameter: 

(24) p = Corr(£ i2 ,e i3 ), 

where — 1 < p < 1. In Appendix C we show that As- 
sumption 1 implies p = 0. (Of course, the contra- 
positive of this statement is also true; p ^ implies 
the violation of Assumption 1). A nonzero correla- 
tion parameter can be interpreted as the existence 
of omitted variables that are related to both the ob- 
served value of the mediator Mj and the potential 
outcomes Yi even after conditioning on the treat- 
ment variable Ti (and the observed covariates Xj). 
Note that these omitted variables must causally pre- 
cede Ti. Then, we vary the value of p and compute 
the corresponding estimate of the ACME. In a quite 
different context, Roy, Hogan and Marcus (2008) 
take this general strategy of computing a quantity 
of interest at various values of an unidentifiable sen- 
sitivity parameter. 
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The next theorem shows that if the treatment is 
randomized, the ACME is identified given a partic- 
ular value of p. 

Theorem 4 (Identification with a given error cor- 
relation). Consider the LSEM defined in equations 
(11), (12) and (13). Suppose that equation (4) holds 
and the correlation between Eii and e^, that is, p, 
is given. If we further assume — 1 < p < 1, then the 
ACME is identified and given by 



5(0) = 6(1) 



0-2 



where cr| = Var(ey) for j = 1,2 and p = Corr^i, 

£i2)- 

A proof is in Appendix D. We offer several re- 
marks about Theorem 4. First, the unbiased esti- 
mates of (a±, a2, /Si, P2) can be obtained by fitting 
the equation-by-equation least squares of equations 
(11) and (12). Given these estimates, the covari- 
ance matrix of (sa,£i2), whose elements are (af,^, 
P~0-\O2), can be consistently estimated by computing 
the sample covariance matrix of the residuals, that 
is, in =Yi-&i- $iTi and i i2 = Mi - a 2 - fhTi. 

Second, the partial derivative of the ACME with 
respect to p implies that the ACME is either mono- 
tonically increasing or decreasing in p, depending on 
the sign of f$2- The ACME is also symmetric about 
(p,8(t)) = (0,f3 2 po- 1 /o- 2 ). 

Third, the ACME is zero if and only if p equals 
p. This implies that researchers can easily check the 
robustness of their conclusion obtained under the se- 
quential ignorability assumption via correlation be- 
tween £ji and Ei2- For example, if 5(t) = $2! is neg- 
ative, the true ACME is also guaranteed to be neg- 
ative if p < p holds. 

Fourth, the expression of the ACME given in The- 
orem 4 is cumbersome to use when computing the 
standard errors. A more straightforward and general 
approach is to apply the iterative feasible general- 
ized least square algorithm of the seemingly unre- 
lated regression (Zellner, 1962), and use the asso- 
ciated asymptotic variance formula. This strategy 
will also work when there is an interaction term be- 
tween the treatment and mediating variables as in 
equation (15) and/or when there are observed pre- 
treatment covariates X{. 

Finally, Theorem 4 implies the following corollary, 
which shows that under the LSEM the data generat- 
ing process is not informative at all about either the 



sensitivity parameter p or the ACME without equa- 
tion (5). This result highlights the difficulty of causal 
mediation analysis and the importance of sensitivity 
analysis even in the parametric modeling setting. 

Corollary 1 (Bounds on the sensitivity parame- 
ter). Consider the LSEM defined in equations (11), 

(12) and (13). Suppose that equation (4) holds but 
equation (5) may not. Then, the sharp, that is, best 
possible, bounds on the sensitivity parameter p and 
ACME are given by (—1,1) and (—00,00), respec- 
tively. 

The first statement of the corollary follows di- 
rectly from the proof of Theorem 4, while the second 
statement can be proved by taking a limit of 5(t) as 
p tends to —1 or 1. 

5.2 Parametric Sensitivity Analysis Based on the 
Coefficients of Determination 

The sensitivity parameter p can be given an alter- 
native definition which allows it to be interpreted as 
the magnitude of an unobserved confounder. This 
alternative version of p is based on the following de- 
composition of the error terms in equations (12) and 

(13) : 

&ij — -\j Ui + Sjj 

for j = 2,3, where [/, is an unobserved confounder 
and the sequential ignorability is assumed given U 
and Tj. Again, note that U has to be a pre-treatment 
variable so that the resulting estimates can be given 
a causal interpretation. In addition, we assume that 
• -L Ui for j = 2,3. We can then express the in- 
fluence of the unobserved pre-treatment confounder 
using the following coefficients of determination: 



and 



T>2* 



T>2* 
tXy 



1 



Varfe' 



i2) 



Var(e i2 ) 



Var(^ 



a > 



Var(e i3 ) ' 

which represent the proportion of previously unex- 
plained variance (either in the mediator or in the 
outcome) that is explained by the unobserved con- 
founder (see Imbens, 2003). 

Another interpretation is based on the proportion 
of original variance that is explained by the unob- 
served confounder. In this case, we use the following 
sensitivity parameters: 

Var(e i2 ) - Var(e' i2 ) 



Var(Mi 



[1-R Z M )R 



Al 
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and 



~ 2 _ Var(e i3 ) 

IXy = 



Var(Y-) 



Ry, 



T>2* 

K Y , 



where R M and R Y represent the coefficients of de- 
termination from the two regressions given in equa- 
tions (12) and (13). Note that unlike R 2 ^ and Ry* 

(as well as p given in Corollary 1), R 2 M and R Y 
are bounded from above by Var(ej2)/ Var(Mj) and 
Var(£j3)/Var(li), respectively. 

In either case, it is straightforward to show that 
the following relationship between p and these pa- 
rameters holds, that is, p 2 = RfjR 2 ? = R 2 M R Y /{(\ - 
R 2 j)(l — R Y )} or, equivalently, 



p = sgn(X 2 X 3 )R* M R 



Y 



sgn(X 2 X 3 )R M R Y 
(l-R 2 M )(l-R 2 



where R* m ,R y ,Rm and R Y are in [0,1]. Thus, in 
this framework, researchers can specify the values of 
(R 2 m,R y ) or (R 2 M ,R Y ) as well as the sign of A 2 A 3 
in order to determine values of p and estimate the 
ACME based on these values of p. Then, the analyst 
can examine variation in the estimated ACME with 
respect to change in these parameters. 



5.3 Extensions to Nonlinear and 
Nonparametric Models 

The proposed sensitivity analysis above is devel- 
oped within the framework of the LSEM, but some 
extensions are possible. For example, Imai, Keele 
and Tingley (2009) show how to conduct sensitiv- 
ity analysis with probit models when the mediator 
and/or the outcome are discrete. In Appendix E, 
while it is substantially more difficult to conduct 
such an analysis in the nonparametric setting, we 
consider sensitivity analysis for the nonparametric 
plug-in estimator introduced in Section 4.2 (see also 
VanderWeele, 2010 for an alternative approach). 

6. EMPIRICAL APPLICATION 

In this section we apply our proposed methods to 
the influential randomized experiment from political 
psychology we described in Section 2. 

6.1 Analysis under Sequential Ignorability 

In the original analysis, Nelson, Clawson and Ox- 
ley (1997) used a LSEM similar to the one discussed 
in Section 3.4 and found that subjects who viewed 
the Klan story with the free speech frame were sig- 
nificantly more tolerant of the Klan than those who 



Table 3 

Parametric and nonparametric estimates of the ACME under sequential ignorability in the media 
framing experiment. Each cell of the table represents an estimated average causal effect and its 95% 
confidence interval. The outcome is the subjects ' tolerance level for the free speech rights of the Ku 

Klux Klan, and the treatments are the public order frame (Ti = 1 ) and the free speech frame 
(Ti = 0). The second column of the table shows the results of the parametric LSEM approach, while 
the third column of the table presents those of the nonparametric estimator. The lower part of the 
table shows the results of parametric mediation analysis under the no-interaction assumption 
[8(1) = 5(0)], while the upper part presents the findings without this assumption, thereby showing the 
estimated average mediation effects under the treatment and the control, that is, 5(1) and 5(0) 



Parametric 



Nonparametric 



Average mediation effects 
Free speech frame 5(0) 

Public order frame 8(1) 

Average total effect f 

With the no-interaction assumption 
Average mediation effect 

5(0) = 5(1) 
Average total effect f 



-0.566 
-1.081, -0.050] 

-0.451 
-0.871, -0.031] 

-0.540 
-1.207, 0.127] 

-0.510 
-0.969, -0.051] 

-0.540 
-1.206, 0.126] 



-0.596 
-1.168, -0.024] 

-0.374 
-0.823, 0.074] 

-0.540 
-1.206, 0.1261 
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saw the story with the public order frame. The re- 
searchers also found evidence supporting their main 
hypothesis that subjects' general attitudes mediated 
the causal effect of the news story frame on toler- 
ance for the Klan. In the analysis that follows, we 
only analyze the public order mediator, for which 
the researchers found a significant mediation effect. 

As we showed in Section 3.4, the original results 
can be given a causal interpretation under sequen- 
tial ignorability, that is, Assumption 1. Here, we first 
make this assumption and estimate causal effects 
based on our theoretical results. Table 3 presents the 
findings. The second and third columns of the table 
show the estimated ACME and average total effect 
based on the LSEM and the nonparametric estima- 
tor, respectively. The 95% asymptotic confidence in- 
tervals are constructed using the Delta method. For 
most of the estimates, the 95% confidence intervals 
do not contain zero, mirroring the finding from the 
original study that general attitudes about public 
order mediated the effect of the media frame. 

As shown in Section 3.4, we can relax the no- 
interaction assumption (Assumption 2) that is im- 
plicit in the LSEM of Baron and Kenny (1986). 
The first and second rows of the table present esti- 
mates from the parametric and nonparametric anal- 
ysis without this assumption. These results show 
that the estimated ACME under the free speech 
condition [5(0)] is larger than the effect under the 
public order condition [6(1)} for both the paramet- 
ric and nonparametric estimators. In fact, the 95% 
confidence interval for the nonparametric estimate 
of 5(1) includes zero. However, we fail to reject the 
null hypothesis of 5(0) = 5(1) under the parametric 
analysis, with a p-value of 0.238. 

Based on this finding, the no-interaction assump- 
tion could be regarded as appropriate. The last two 
rows in Table 3 contain the analysis based on the 
parametric estimator under this assumption. As ex- 
pected, the estimated ACME is between the previ- 
ous two estimates, and the 95% confidence interval 
does not contain zero. Finally, the estimated aver- 
age total effect is identical to that without Assump- 
tion 2. This makes sense since the no-interaction as- 
sumption only restricts the way the treatment effect 
is transmitted to the outcome and thus does not af- 
fect the estimate of the overall treatment effect. 

6.2 Sensitivity Analysis 

The estimates in Section 6.1 are identified if the 
sequential ignorability assumption holds. However, 



since the original researchers randomized news sto- 
ries but subjects' attitudes were merely observed, it 
is unlikely this assumption holds. As we discussed 
in Section 2, one particular concern is that sub- 
jects' pre-existing ideology affects both their atti- 
tudes toward public order issues and their tolerance 
for the Klan within each treatment condition. Thus, 
we next ask how sensitive these estimates are to vi- 
olations of this assumption using the methods pro- 
posed in Section 5. We consider political ideology to 
be a possible unobserved pre-treatment confounder. 
We also maintain Assumption 2. 

Figure 1 presents the results for the sensitivity 
analysis based on the residual correlation. We plot 
the estimated ACME of the attitude mediator against 
differing values of the sensitivity parameter p, which 
is equal to the correlation between the two error 
terms of equations (27) and (28) for each. The anal- 
ysis indicates that the original conclusion about the 
direction of the ACME under Assumption 1 (repre- 
sented by the dashed horizontal line) would be main- 
tained unless p is less than —0.68. This implies that 
the conclusion is plausible given even fairly large 
departures from the ignorability of the mediator. 
This result holds even after we take into account the 
sampling variability, as the confidence interval cov- 
ers the value of zero only when — 0.79 < p < — 0.49. 
Thus, the original finding about the negative ACME 
is relatively robust to the violation of equation (5) 
of Assumption 1 under the LSEM. 




-1.0 -0.5 0.0 0.5 1.0 

Sensitivity Parameter: p 



Fig. 1. Sensitivity analysis for the media framing experi- 
ment. The figure presents the results of the sensitivity analy- 
sis described in Section 5. The solid line represents the esti- 
mated ACME for the attitude mediator for differing values of 
the sensitivity parameter p, which is defined in equation (24). 
The gray region represents the 95% confidence interval based 
on the Delta method. The horizontal dashed line is drawn at 
the point estimate of 8 under Assumption 1. 



Proportion of unexplained variance 
explained by an unobserved confounder 



Proportion of original variance 
explained by an unobserved confounder 




Fig. 2. An alternative interpretation of the sensitivity analysis. The plot presents the results of the sensitivity analysis described in Section 5. Each plot contains 
various mediation effects under an unobserved pre-treatment confounder of various magnitudes. The left plot contains the contours for R A } and Ry which represent 
the proportion of unexplained variance that is explained by the unobserved confounder for the mediator and outcome, respectively. The right plot contains the contours 
for R? M and Ry which represent the proportion of the variance explained by the unobserved pre-treatment confounder. Each line represents the estimated ACME 
under proposed values of either {R*^,Ry*) or (R^j , Ry) . The term sgn(A2A3) represents the sign on the product of the coefficients of the unobserved confounder. 
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Next, we present the same sensitivity analysis us- 
ing the alternative interpretation of p which is based 
on two coefficients of determination as defined in 
Section 5; (1) the proportion of unexplained variance 
that is explained by an unobserved pre-treatment 
confounder (R^ and Ry) and (2) the proportion 
of the original variance explained by the same un- 
observed confounder (R\j and Ry)- Figure 2 shows 
two plots based on the types of coefficients of de- 
termination. The lower left quadrant of each plot 
in the figure represents the case where the product 
of the coefficients for the unobserved confounder is 
negative, while the upper right quadrant represents 
the case where the product is positive. 

For example, this product will be positive if the 
unobserved pre-treatment confounder represents sub- 
jects' political ideology, since conservatism is likely 
to be positively correlated with both public order 
importance and tolerance for the Klan. Under this 
scenario, the original conclusion about the direc- 
tion of the ACME is perfectly robust to the viola- 
tion of sequential ignorability, because the estimated 
ACME is always negative in the upper right quad- 
rant of each plot. On the other hand, the result is 
less robust to the existence of an unobserved con- 
founder that has opposite effects on the mediator 
and outcome. However, even for this alternative sit- 
uation, the ACME is still guaranteed to be nega- 
tive as long as the unobserved confounder explains 
less than 27.7% of the variance in the mediator or 
outcome that is left unexplained by the treatment 
alone, no matter how large the corresponding por- 
tion of the variance in the other variable may be. 
Similarly, the direction of the original estimate is 
maintained if the unobserved confounder explains 
less than 26.7% (14.7%) of the original variance in 
the mediator (outcome), regardless of the degree of 
confounding for the outcome (mediator). 

7. CONCLUDING REMARKS 

In this paper we study identification, inference 
and sensitivity analysis for causal mediation effects. 
Causal mediation analysis is routinely conducted 
in various disciplines, and our paper contributes to 
this fast-growing methodological literature in sev- 
eral ways. First, we provide a new identification con- 
dition for the ACME, which is relatively easy to in- 
terpret in substantive terms and also weaker than 
existing results in some situations. Second, we prove 
that the estimates based on the standard LSEM 



can be given valid causal interpretations under our 
proposed framework. This provides a basis for for- 
mally analyzing the validity of empirical studies us- 
ing the LSEM framework. Third, we propose simple 
nonparametric estimation strategies for the ACME. 
This allows researchers to avoid the stronger func- 
tional form assumptions required in the standard 
LSEM. Finally, we offer a parametric sensitivity anal- 
ysis that can be easily used by applied researchers in 
order to assess the sensitivity of estimates to the vi- 
olation of this assumption. We view sensitivity anal- 
ysis as an essential part of causal mediation analy- 
sis because the assumptions required for identifying 
causal mediation effects are unverifiable and often 
are not justified in applied settings. 

At this point, it is worth briefly considering the 
progression of mediation research from its roots in 
the empirical psychology literature to the present. In 
their seminal paper, Baron and Kenny (1986) sup- 
plied applied researchers with a simple method for 
mediation analysis. This method has quickly gained 
widespread acceptance in a number of applied fields. 
While psychologists extended this LSEM framework 
in a number of ways, little attention was paid to 
the conditions under which their popular estima- 
tor can be given a causal interpretation. Indeed, 
the formal definition of the concept of causal me- 
diation had to await the later works by epidemiolo- 
gists and statisticians (Robins and Greenland, 1992; 
Pearl, 2001; Robins, 2003). The progress made on 
the identification of causal mediation effects by these 
authors has led to the recent development of alter- 
native and more general estimation strategies (e.g., 
Imai, Keele and Tingley, 2009; VanderWeele, 2009). 
In this paper we show that under a set of assump- 
tions this popular product of coefficients estima- 
tor can be given a causal interpretation. Thus, over 
twenty years later, the work of Baron and Kenny 
has come full circle. 

Despite its natural appeal to applied scientists, 
statisticians often find the concept of causal medi- 
ation mysterious (e.g., Rubin, 2004). Part of this 
skepticism seems to stem from the concept's inher- 
ent dependence on background scientific theory; 
whether a variable qualifies as a mediator in a given 
empirical study relies crucially on the investigator's 
belief in the theory being considered. For example, 
in the social science application introduced in Sec- 
tion 2, the original authors test whether the effect of 
a media framing on citizens' opinion about the Klan 
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rally is mediated by a change in attitudes about gen- 
eral issues. Such a setup might make no sense to an- 
other political psychologist who hypothesizes that 
the change in citizens' opinion about the Klan rally 
prompts shifts in their attitudes about more gen- 
eral underlying issues. The H1N1 flu virus example 
mentioned in Section 3.1 also highlights the same 
fundamental point. Thus, causal mediation analysis 
can be uncomfortably far from a completely data- 
oriented approach to scientific investigations. It is, 
however, precisely this aspect of causal mediation 
analysis that makes it appealing to those who resist 
standard statistical analyses that focus on estimat- 
ing treatment effects, an approach which has been 
somewhat pejoratively labeled as a "black-box" view 
of causality (e.g., Skrabanek, 1994; Deaton, 2009). 
It may be the case that causal mediation analysis 
has the potential to significantly broaden the scope 
of statistical analysis of causation and build a bridge 
between scientists and statisticians. 

There are a number of possible future generaliza- 
tions of the proposed methods. First, the sensitiv- 
ity analysis can potentially be extended to various 
nonlinear regression models. Some of this has been 
done by Imai, Keele and Tingley (2009). Second, 
an important generalization would be to allow mul- 
tiple mediators in the identification analysis. This 
will be particularly valuable since in many applica- 
tions researchers aim to test competing hypotheses 
about alternative causal mechanisms via mediation 
analysis. For example, the media framing study we 
analyzed in this paper included another measure- 
ment (on a separate group randomly split from the 
study sample) which was purported to test an alter- 
native causal pathway. The formal treatment of this 
issue will be a major topic of future research. Third, 
implications of measurement error in the mediator 
variable have yet to be analyzed. This represents an- 
other important research topic, as mismeasured me- 
diators are quite common, particularly in psycholog- 
ical studies. Fourth, an important limitation of our 
framework is that it does not allow the presence of a 
post-treatment variable that confounds the relation- 
ship between mediator and outcome. As discussed 
in Section 3.3, some of the previous results avoid 
this problem by making additional identification as- 
sumptions (e.g., Robins, 2003). The exploration of 
alternative solutions is also left for future research. 
Finally, it is important to develop new experimen- 
tal designs that help identify causal mediation ef- 
fects with weaker assumptions. Imai, Tingley and 



Yamamoto (2009) present some new ideas on the 
experimental identification of causal mechanisms. 

APPENDIX A: PROOF OF THEOREM 1 

First, note that equation (4) in Assumption 1 im- 
plies 

(25) Yi(t',m) ±Ti\Mi(t) = m', X i = x. 
Now, for any t,t', we have 

E(Y l (t,M i (t f ))\X i =x) 

= jE{Y l {t,m)\M i (t') = m,X i =x) 

dF Mi (t')\x i =x{'m) 
= j E(Y i (t,m)\M i (t') = m,T i = t',X i =x) 

= j E(Y i (t,m)\T i = t',X i = x) 

dF Mi {t')\Xi=x{m) 
= J E(Y l (t,m)\T i = t,X i = x) 

dF Mi {t')\T,=t' ,x % =x{m) 
= J E(Y i (t,m)\M i (t) = m,T i = t,X i = x) 

dF Mi {ti)\T t =t> ,x i= x{™) 
= J mm=m ,T, = t ,X t = x) 

dF Mi (t')\Ti=t' ,Xi=xi m ) 

(26) = JE(Y i \M i = m,T i = t,X i = x) 

dF M . lT . =t , )X . =x (m), 

where the second equality follows from equation (25), 
equation (5) is used to establish the third and fifth 
equalities, equation (4) is used to establish the fourth 
and last equalities, and the sixth equality follows 
from the fact that M; = M;(T;) and Y { = Y^T^M^)). 
Finally, equation (26) implies 

E(i-,(t,Mi(i'))) 

= JJ mlMi=m ,T, = t ,X, = x) 

dF M .\ T . =t , >x . =x (m) dF Xl (x). 



18 



K. IMAI, L. KEELE AND T. YAMAMOTO 



Substituting this expression into the definition of 
5(t) given by equations (1) and (2) yields the de- 
sired expression for the ACME. In addition, since 
f = + 5(t') for any t,t' = 0, 1 and t / t' under 
Assumption 1, the result for the average natural di- 
rect effects is also immediate. 

APPENDIX B: PROOF OF THEOREM 2 

We first show that under Assumption 1 the model 
parameters in the LSEM are identified. Rewrite equa- 
tions (12) and (13) using the potential outcome no- 
tation as follows: 

(27) MiiTi) = a 2 + foTi + e a (Tj, 

Yi(Ti,Mi(Td) = a 3 + foTi + 7^(7-) 

+ e a (Ti,Mi(Ti)), 



APPENDIX D: PROOF OF THEOREM 4 

First, we write the LSEM in terms of equations (12) 
and (14). We omit possible pre-treatment confounders 
Xi from the model for notational simplicity, although 
the result below remains true even if such confounders 
are included. Since equation (4) implies E(ejj|Tj) = 
for j = 2, 3, we can consistently estimate (ai, a 2 , 
/3 2 ), where a\ = a 3 + a 2 7 and f3\ = /3 3 + (3 2 J, as 
well as , (Tg , . Thus, given a particular value of 
p, we have po\o 2 = jcr 2 + and a\ = ^ 2 a\ + 

= 0, then 7 = pa\/a 2 provided 



03 + 27/90 2 3 . If p 



(28) 



where the following normalization is used: E(ej 2 (t)) = 
E(ei3(t, m)) = for t = 0, 1 and m € Ai. Then, equa- 
tion (4) of Assumption 1 implies £j 2 (t) JLTj, yield- 
ing E(ei 2 (Ti)|Ti = t) = E(e i2 (i)) = for any t = 0, 1. 
Similarly, equation (5) implies £i3(t, m) i Mj|Tj = 
i for all t and m, yielding E (e i3 (Tj , Mj (Tj ) ) | Tj = 
t,Mi = m) = E(e i3 (t,m)|r i = i) = E(e a (i,m)) = 
for any i and m where the second equality follows 
from equation (4). Thus, the parameters in equa- 
tions (12) and (13) are identified under Assump- 
tion 1. Finally, under Assumption 1 and the LSEM, 
we can write E(Mi|Ti) = a 2 + /3 2 Ti, and E(yi|Mi,Ti) = 
Q3 + fi^Ti + 7-Mj. Using these expressions and The- 
orem 1, the ACME can be shown to equal /?27- 

APPENDIX C: PROOF THAT p = UNDER 
ASSUMPTION 1 

First, as shown in Appendix B, Assumption 1 im- 
plies E( £i2 (Ti)|Ti) = and E(e i3 (T i) Mi(TJ)\Ti, 
Mi) = where the (potential) error terms are de- 
fined in equations (27) and (28). These mean in- 
dependence relationships (together with the law of 
iterated expectations) imply 



= E(e i3 (T i ,M i (T l ))M l ) 
= E{e i3 {T i ,M l (T l ))(a 2 + f3 2 T l 
= E{e i3 (T i ,M l (T l ))e l2 (T i ))}. 



Si2(T i ))} 



Thus, under Assumption 1, we have p 
E{e l2 (T l )e i3 (T i ,M l {T l ))} = 0. 



p ) > 0. Now, assume p ^ 0. Then, 



that 0! = 0^(1 
substituting 03 = (poi — 702)//) into the above ex- 
pression of a\ yields the following quadratic equa- 
tion: 7 2 - 27^/02 + a 2 {p 2 - p 2 )/{a 2 {l - p 2 )} = 
0. Solving this equation and using 03 > 0, we ob- 
tain the following desired expression: 7 = — 

p\j (1 — p 2 ) /{I — p 2 )}- Thus, given a particular value 
of p, 5(t) is identified. 

APPENDIX E: NONPARAMETRIC 
SENSITIVITY ANALYSIS 

We consider a sensitivity analysis for the simple 
plug-in nonparametric estimator introduced in Sec- 
tion 4.2. Unfortunately, sensitivity analysis is not as 
straightforward as the parametric settings. Here, we 
examine the special case of binary mediator and out- 
come where some progress can be made and leave 
the development of sensitivity analysis in a more 
general nonparametric case for future research. 

We begin by the nonparametric bounds on the 
ACME without assuming equation (5) of the se- 
quential ignorability assumption. In the case of bi- 
nary mediator and outcome, we can derive the fol- 
lowing sharp bounds using the result of (2009): 



(29) 



—-P001 


— P011 




— Pooo 


— P001 


— -P100 


—-P011 


- P010 


— -Pi 10 



(30) 



max < — 



{-P101 + -P111 
Pooo + ^100 + P101 
-P010 + -P110 + P111 

—Pi 00 — Pi 10 
max < -Pool - P100 - -P101 
k — P110 — P011 — -P111 

{-Pooo + -P010 
^010 + Pou + Pin 
-Pooo + -P001 + P101 
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where P ymt = Pr(Yi = y,Mi = m\Ti = t) for all y, m, 
t £ {0, 1}. These bounds always contain zero, im- 
plying that the sign of the ACME is not identified 
without an additional assumption even in this spe- 
cial case. 

To construct a sensitivity analysis, we follow the 
strategy of Imai and Yamamoto (2010) and first ex- 
press the second assumption of sequential ignorabil- 
ity using the potential outcomes notation as follows: 

Pr(K t (l,l)= m ,Y 4 (l,0) = yi , 

Y 4 (0, 1) = y i, Yi(0, 0) = yoolMi = 1, T { = t') 
(31) = Pr(Yi(l, 1) = y n , Y;(1,0) = y w , 
Yi (0,1) = m,Yi (0,0) = 2/00 1 
Mi = 0,Ti=t') 

for all t', yt m , G {0, 1}. The equality states that within 
each treatment group the mediator is assigned inde- 
pendent of potential outcomes. We now consider the 
following sensitivity parameter v, which is the maxi- 
mum possible difference between the left- and right- 
hand side of equation (31). That is, v represents the 
upper bound on the absolute difference in the pro- 
portion of any principal stratum that may exist be- 
tween those who take different values of the media- 
tor given the same treatment status. Thus, this pro- 
vides one way to parametrize the maximum degree 
to which the sequential ignorability can be violated. 
(Other, potentially more intuitive, parametrization 
are possible, but, as shown below, this parametriza- 
tion allows for easier computation of the bounds.) 

Using the population proportion of each princi- 
pal stratum, that is, Tr™*™^^ = Pr(Y,(l, 1) = y u , 
Y 4 (1,0) = 2/10,^(0,1) = 2/01,^(0,0) = 2/oo,^(l) = 
mi, Mj(0) = mo), we can write this difference as fol- 
lows: 



V 1 

^— in 



m =0 ^S/iiS/ioS/oiS/OO 



lmo 



V 1 

t— in 



mo=0 ^J/ilS/ioS/OlJ/OO 



Omo 



(32) 



j/ll 



J2y=0 P\ 



yoi 



<v, 

Z— /mi = 



-mil 

ynyioyoiyoo 



V 1 



J/11J/10S/01J/00 



(33) 



0^/10 



J2 y=0 P* 



y00 



<V, 



where v is bounded between and 1. Clearly, if and 
only if v = 0, the sequential ignorability assumption 
is satisfied. 



Finally, note that the ACME can be written as 
the following linear function of unknown parame- 



ucio "y n y 10 y 01 y 00 - 



*(') = £ E E E 



(34) 
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where one of the subscripts of tt corresponding to 
ytm is equal to 1. Then, given a fixed value of sensi- 
tivity parameter v, you can obtain the sharp bounds 
on the ACME by numerically solving the linear opti- 
mization problem with the linear constraints implied 
by equations (32) and (33) as well as the following 
relationship implied by the ignorability of the treat- 
ment assignment: 



P, 



ymt 



1 

E 



l 

E 



l 



E 



7T 



mi mo 

yuyioyoiyoo 



yi-t, m =0yt,i- 



=0 2/i-t,i-m=0mi_ t =0 



(35) 

for each y,m,t € {0, 1}. In addition, we use the linear 

constraint that all TT^iJ^i/oii/oo sum U P to 1- 

We apply this framework to the media framing 
example described in Sections 2 and 6. For the pur- 
pose of illustration, we dichotomize both the me- 
diator and treatment variables using their sample 
medians as cutpoints. Figure 3 shows the results of 
this analysis. In each panel the solid curves represent 
the sharp upper and lower bounds on the ACME 
for different values of the sensitivity parameter v. 
The horizontal dashed lines represent the point es- 
timates of 5(1) (upper panel) and 5(0) (lower panel) 
under Assumption 1. This corresponds to the case 
where the sensitivity parameter is exactly equal to 
zero (i.e., v = 0), so that equation (31) holds. The 
sharp bounds widen as we increase the value of v, 
until they flatten out and become equal to the no- 
assumption bounds given in equations (29) and (30). 

The results suggest that the point estimates of the 
ACME are rather sensitive to the violation of the 
sequential ignorability assumption. For both 5(1) 
and 5(0), the upper bounds sharply increase as we 
increase the value of v and cross the zero line at 
small values of v [0.019 for 5(1) and 0.022 for 5(0)}. 
This contrasts with the parametric sensitivity anal- 
yses reported in Section 6.2, where the estimates of 
the ACME appeared quite robust to the violation 
of Assumption 1. Although the direct comparison 
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is difficult because of different parametrization and 
variable coding, this stark difference illustrates the 
potential importance of parametric assumptions in 
causal mediation analysis; a significant part of iden- 
tification power could in fact be attributed to such 
functional form assumptions as opposed to empirical 
evidence. 
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